skellam (Skellam distribution)#
The Skellam distribution models the difference of two independent Poisson counts: (X = Y_1 - Y_2), where (Y_1 \sim \text{Pois}(\mu_1)) and (Y_2 \sim \text{Pois}(\mu_2)).
It is a natural model for net counts: wins minus losses, arrivals minus departures, goals for minus goals against, etc.
This notebook uses the same parameterization as scipy.stats.skellam:
mu1= mean/rate of the first Poisson process ((\mu_1 \ge 0))mu2= mean/rate of the second Poisson process ((\mu_2 \ge 0))
Learning goals#
By the end you should be able to:
recognize problems where a Skellam model makes sense (and when it doesn’t)
write down the PMF/CDF and key properties
derive the mean, variance, and log-likelihood
sample from the distribution using NumPy only
visualize PMF/CDF and validate formulas with Monte Carlo simulation
use
scipy.stats.skellamfor computation and fitting
Table of contents#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations
Sampling & Simulation
Visualization
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import math
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from scipy import optimize, special, stats
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
1) Title & Classification#
Name: skellam (Skellam distribution)
Type: Discrete
Support: (k \in \mathbb{Z})
Parameters: (\mu_1, \mu_2)
Parameter space: (\mu_1 \ge 0,\ \mu_2 \ge 0) (not both zero for a non-degenerate distribution)
In SciPy, skellam also accepts a loc shift. In this notebook we focus on the standard, unshifted case (loc = 0).
2) Intuition & Motivation#
Think of two independent counting processes over the same time window:
(Y_1): “positive” events (arrivals, goals for, wins, upward jumps)
(Y_2): “negative” events (departures, goals against, losses, downward jumps)
The Skellam variable (X = Y_1 - Y_2) is the net balance.
Typical real-world use cases#
Sports score differential: if each team’s goals are modeled as independent Poisson counts, the goal difference is Skellam.
A/B event streams: net count of two independent event types in logs (e.g., adds − deletes).
Queueing / inventory: arrivals minus services (over short windows where rates are approximately constant).
Change scores: differences of counts in two conditions or two periods (when each is Poisson-like).
Relations to other distributions#
Poisson: if (\mu_2 = 0), then (X \sim \text{Pois}(\mu_1)); if (\mu_1 = 0), then (-X \sim \text{Pois}(\mu_2)).
Normal approximation: for large (\mu_1 + \mu_2), (X) is approximately (\mathcal{N}(\mu_1-\mu_2,\ \mu_1+\mu_2)).
Poisson–Binomial connection: if (N = Y_1 + Y_2), then (N \sim \text{Pois}(\mu_1+\mu_2)) and (Y_1 \mid N \sim \text{Bin}(N,\ \mu_1/(\mu_1+\mu_2))). Since (X = 2Y_1 - N), this gives an alternative view of Skellam.
Additivity: differences of independent Poissons add: if (X_i = Y_{1,i}-Y_{2,i}) with independent Poisson components, then (\sum_i X_i) is Skellam with summed rates.
3) Formal Definition#
Let (Y_1 \sim \text{Pois}(\mu_1)) and (Y_2 \sim \text{Pois}(\mu_2)) be independent. Define [ X = Y_1 - Y_2. ]
PMF#
For (k \in \mathbb{Z}), the probability mass function is [ \mathbb{P}(X = k) = e^{-(\mu_1+\mu_2)}\left(\frac{\mu_1}{\mu_2}\right)^{k/2} I_{|k|}!\left(2\sqrt{\mu_1\mu_2}\right), ] where (I_{\nu}(\cdot)) is the modified Bessel function of the first kind.
Boundary cases are intuitive:
if (\mu_2=0), then (X) is Poisson on ({0,1,2,\dots})
if (\mu_1=0), then (X) is the negative of a Poisson on ({0,-1,-2,\dots})
if (\mu_1=\mu_2=0), then (X=0) almost surely
CDF#
The cumulative distribution function is [ F(k) = \mathbb{P}(X \le k) = \sum_{j=-\infty}^{k} \mathbb{P}(X=j). ] There is no simple closed form in general; in practice it is computed numerically (e.g., by SciPy).
4) Moments & Properties#
Let (X \sim \text{Skellam}(\mu_1, \mu_2)).
Mean and variance#
[ \mathbb{E}[X] = \mu_1 - \mu_2,\qquad \mathrm{Var}(X) = \mu_1 + \mu_2. ]
Skewness and kurtosis#
Let (\sigma^2 = \mu_1 + \mu_2) (assuming (\sigma^2>0)). Then [ \text{skew}(X) = \frac{\mu_1 - \mu_2}{(\mu_1+\mu_2)^{3/2}}. ] The excess kurtosis is [ \text{excess kurt}(X) = \frac{1}{\mu_1+\mu_2}. ] (so the full kurtosis is (3 + 1/(\mu_1+\mu_2))).
MGF, characteristic function, and cumulants#
The moment generating function exists for all real (t) and is [ M_X(t) = \mathbb{E}[e^{tX}] = \exp\big(\mu_1(e^t-1) + \mu_2(e^{-t}-1)\big). ] The characteristic function is [ \varphi_X(t) = \mathbb{E}[e^{itX}] = \exp\big(\mu_1(e^{it}-1) + \mu_2(e^{-it}-1)\big). ] The cumulant generating function (K(t)=\log M_X(t)) is [ K(t) = \mu_1(e^t-1) + \mu_2(e^{-t}-1). ] Differentiating shows a neat pattern: odd-order cumulants equal (\mu_1-\mu_2), even-order cumulants equal (\mu_1+\mu_2).
Entropy#
The (Shannon) entropy is [ H(X) = -\sum_{k\in\mathbb{Z}} \mathbb{P}(X=k),\log \mathbb{P}(X=k). ] There is no simple closed form in general, but it can be approximated accurately by truncating the tails.
Other useful properties#
Symmetry: if (\mu_1=\mu_2), the distribution is symmetric around 0.
Additivity: if (X_1\sim\text{Skellam}(\mu_{1a},\mu_{2a})) and (X_2\sim\text{Skellam}(\mu_{1b},\mu_{2b})) independent, then (X_1+X_2\sim\text{Skellam}(\mu_{1a}+\mu_{1b},\ \mu_{2a}+\mu_{2b})).
Normal approximation: when (\mu_1+\mu_2) is large, a Normal approximation is often excellent.
def _validate_mu(mu, name):
mu_float = float(mu)
if mu_float < 0.0:
raise ValueError(f"{name} must be >= 0")
return mu_float
def _validate_mu1_mu2(mu1, mu2):
mu1 = _validate_mu(mu1, "mu1")
mu2 = _validate_mu(mu2, "mu2")
return mu1, mu2
def skellam_logpmf(k, mu1, mu2):
"""Log PMF for Skellam(mu1, mu2) for integer k. Returns -inf outside support."""
mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
k_arr = np.asarray(k)
out = np.full(k_arr.shape, -np.inf, dtype=float)
k_int = k_arr.astype(int)
is_int = k_int == k_arr
if not np.any(is_int):
return out
# Degenerate and boundary cases
if mu1 == 0.0 and mu2 == 0.0:
out[is_int & (k_int == 0)] = 0.0
return out
if mu1 == 0.0:
mask = is_int & (k_int <= 0)
out[mask] = stats.poisson.logpmf(-k_int[mask], mu2)
return out
if mu2 == 0.0:
mask = is_int & (k_int >= 0)
out[mask] = stats.poisson.logpmf(k_int[mask], mu1)
return out
kv = k_int[is_int]
v = np.abs(kv)
z = 2.0 * math.sqrt(mu1 * mu2)
# Use the exponentially scaled Bessel function for numerical stability:
# ive(v, z) = exp(-|z|) * I_v(z) => log I_v(z) = log ive(v, z) + |z|
log_iv = np.log(special.ive(v, z)) + abs(z)
log_ratio = math.log(mu1) - math.log(mu2)
out[is_int] = -(mu1 + mu2) + 0.5 * kv * log_ratio + log_iv
return out
def skellam_pmf(k, mu1, mu2):
return np.exp(skellam_logpmf(k, mu1, mu2))
def skellam_moments(mu1, mu2):
mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
mean = mu1 - mu2
var = mu1 + mu2
if var == 0.0:
return {
"mean": 0.0,
"var": 0.0,
"skew": float("nan"),
"excess_kurt": float("nan"),
"kurt": float("nan"),
}
skew = (mu1 - mu2) / (var**1.5)
excess_kurt = 1.0 / var
return {
"mean": mean,
"var": var,
"skew": skew,
"excess_kurt": excess_kurt,
"kurt": 3.0 + excess_kurt,
}
def skellam_mgf(t, mu1, mu2):
mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
t = np.asarray(t, dtype=float)
return np.exp(mu1 * (np.exp(t) - 1.0) + mu2 * (np.exp(-t) - 1.0))
def skellam_cf(t, mu1, mu2):
mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
t = np.asarray(t, dtype=float)
return np.exp(mu1 * (np.exp(1j * t) - 1.0) + mu2 * (np.exp(-1j * t) - 1.0))
def sample_skewness(x):
x = np.asarray(x, dtype=float)
m = x.mean()
c = x - m
m2 = np.mean(c**2)
if m2 == 0.0:
return float("nan")
m3 = np.mean(c**3)
return float(m3 / (m2**1.5))
def sample_excess_kurtosis(x):
x = np.asarray(x, dtype=float)
m = x.mean()
c = x - m
m2 = np.mean(c**2)
if m2 == 0.0:
return float("nan")
m4 = np.mean(c**4)
return float(m4 / (m2**2) - 3.0)
def skellam_entropy(mu1, mu2, *, tail_prob=1e-12, base=math.e):
"""Approximate entropy by truncating tails with total probability `tail_prob`."""
mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
if mu1 == 0.0 and mu2 == 0.0:
return 0.0
tail_prob = float(tail_prob)
if not (0.0 < tail_prob < 1.0):
raise ValueError("tail_prob must be in (0, 1)")
lo = stats.skellam.ppf(tail_prob / 2.0, mu1, mu2)
hi = stats.skellam.ppf(1.0 - tail_prob / 2.0, mu1, mu2)
if not (np.isfinite(lo) and np.isfinite(hi)):
mean = mu1 - mu2
sd = math.sqrt(mu1 + mu2)
lo = math.floor(mean - 20.0 * sd)
hi = math.ceil(mean + 20.0 * sd)
lo = int(math.floor(lo))
hi = int(math.ceil(hi))
ks = np.arange(lo, hi + 1)
logp = stats.skellam.logpmf(ks, mu1, mu2)
p = np.exp(logp)
h_nats = -np.sum(p * logp)
return float(h_nats / math.log(base))
def skellam_mom_estimates(x):
"""Method-of-moments estimates from sample mean and variance."""
x = np.asarray(x)
m = float(x.mean())
v = float(x.var(ddof=0))
mu1_hat = 0.5 * (v + m)
mu2_hat = 0.5 * (v - m)
return max(mu1_hat, 0.0), max(mu2_hat, 0.0)
mu1, mu2 = 8.0, 5.0
mom = skellam_moments(mu1, mu2)
samples = rng.poisson(mu1, size=300_000) - rng.poisson(mu2, size=300_000)
{
"formula_mean": mom["mean"],
"mc_mean": float(samples.mean()),
"formula_var": mom["var"],
"mc_var": float(samples.var(ddof=0)),
"formula_skew": mom["skew"],
"mc_skew": sample_skewness(samples),
"formula_excess_kurt": mom["excess_kurt"],
"mc_excess_kurt": sample_excess_kurtosis(samples),
"entropy_bits (approx)": skellam_entropy(mu1, mu2, base=2),
}
{'formula_mean': 3.0,
'mc_mean': 3.001663333333333,
'formula_var': 13.0,
'mc_var': 12.952040566655556,
'formula_skew': 0.06400386879521874,
'mc_skew': 0.06958811620543011,
'formula_excess_kurt': 0.07692307692307693,
'mc_excess_kurt': 0.06572870545169307,
'entropy_bits (approx)': 3.896700141704304}
5) Parameter Interpretation#
You can think of (\mu_1) as the expected number of positive events and (\mu_2) as the expected number of negative events.
Location / drift: (\mathbb{E}[X] = \mu_1 - \mu_2). Increasing (\mu_1) shifts mass to the right; increasing (\mu_2) shifts mass to the left.
Dispersion: (\mathrm{Var}(X) = \mu_1 + \mu_2). Increasing both parameters makes the distribution wider.
Symmetry: if (\mu_1=\mu_2), the distribution is symmetric around 0.
Asymptotics: for large (\mu_1+\mu_2), the PMF looks increasingly bell-shaped (Normal approximation).
The plot below compares several parameter settings.
param_sets = [
(5.0, 5.0),
(10.0, 5.0),
(5.0, 10.0),
(20.0, 20.0),
]
alpha = 0.001
lo = int(min(stats.skellam.ppf(alpha, m1, m2) for m1, m2 in param_sets))
hi = int(max(stats.skellam.ppf(1 - alpha, m1, m2) for m1, m2 in param_sets))
ks = np.arange(lo, hi + 1)
fig = go.Figure()
for m1, m2 in param_sets:
pmf = stats.skellam.pmf(ks, m1, m2)
fig.add_trace(
go.Scatter(
x=ks,
y=pmf,
mode="lines+markers",
name=f"mu1={m1:g}, mu2={m2:g}",
)
)
fig.update_layout(
title="Skellam PMF for different (mu1, mu2)",
xaxis_title="k",
yaxis_title="P(X=k)",
)
fig.show()
6) Derivations#
A) Expectation#
Using linearity of expectation and the fact that a Poisson((\mu)) variable has mean (\mu): [ \mathbb{E}[X] = \mathbb{E}[Y_1 - Y_2] = \mathbb{E}[Y_1] - \mathbb{E}[Y_2] = \mu_1 - \mu_2. ]
B) Variance#
Independence gives (\mathrm{Cov}(Y_1, Y_2)=0). Poisson variables have variance equal to their mean, so [ \mathrm{Var}(X) = \mathrm{Var}(Y_1 - Y_2) = \mathrm{Var}(Y_1) + \mathrm{Var}(Y_2) = \mu_1 + \mu_2. ]
(You can also get both results by differentiating the cumulant generating function (K(t)=\mu_1(e^t-1)+\mu_2(e^{-t}-1)) at (t=0).)
C) Likelihood#
For observations (x_1,\dots,x_n\in\mathbb{Z}), the likelihood is [ L(\mu_1,\mu_2) = \prod_{i=1}^{n} \mathbb{P}(X=x_i\mid \mu_1,\mu_2), ] and the log-likelihood (for (\mu_1,\mu_2>0)) is [ \ell(\mu_1,\mu_2) = \sum_{i=1}^{n}\Big[-(\mu_1+\mu_2) + \tfrac{x_i}{2}\log(\mu_1/\mu_2)
\log I_{|x_i|}(2\sqrt{\mu_1\mu_2})\Big]. ] There is no general closed-form MLE; we typically use numerical optimization, often initialized with method-of-moments estimates.
mu1_true, mu2_true = 7.0, 4.0
data = rng.poisson(mu1_true, size=3_000) - rng.poisson(mu2_true, size=3_000)
mu1_mom, mu2_mom = skellam_mom_estimates(data)
def nll(log_params):
mu1 = math.exp(log_params[0])
mu2 = math.exp(log_params[1])
return -stats.skellam.logpmf(data, mu1, mu2).sum()
x0 = np.log([max(mu1_mom, 1e-6), max(mu2_mom, 1e-6)])
res = optimize.minimize(nll, x0=x0, method="L-BFGS-B")
mu1_mle, mu2_mle = np.exp(res.x)
{
"mu1_true": mu1_true,
"mu2_true": mu2_true,
"mu1_mom": mu1_mom,
"mu2_mom": mu2_mom,
"mu1_mle": float(mu1_mle),
"mu2_mle": float(mu2_mle),
"success": bool(res.success),
}
{'mu1_true': 7.0,
'mu2_true': 4.0,
'mu1_mom': 6.923159111111111,
'mu2_mom': 3.904492444444444,
'mu1_mle': 6.928769718257335,
'mu2_mle': 3.9101030817961346,
'success': True}
7) Sampling & Simulation#
The defining construction immediately gives a simple sampler:
draw (Y_1 \sim \text{Pois}(\mu_1))
draw (Y_2 \sim \text{Pois}(\mu_2)) independently
return (X = Y_1 - Y_2)
This is NumPy-only because NumPy’s random generator includes a Poisson sampler (rng.poisson).
def sample_skellam_numpy(mu1, mu2, size, *, rng):
mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
y1 = rng.poisson(lam=mu1, size=size)
y2 = rng.poisson(lam=mu2, size=size)
return y1 - y2
mu1, mu2 = 3.5, 2.0
x = sample_skellam_numpy(mu1, mu2, size=200_000, rng=rng)
{
"sample_mean": float(x.mean()),
"theory_mean": mu1 - mu2,
"sample_var": float(x.var(ddof=0)),
"theory_var": mu1 + mu2,
}
{'sample_mean': 1.49472,
'theory_mean': 1.5,
'sample_var': 5.4990721216,
'theory_var': 5.5}
8) Visualization#
We’ll visualize:
the PMF over a high-probability range (a finite window covering most mass)
the CDF as a step function
Monte Carlo samples vs the exact PMF
mu1, mu2 = 9.0, 6.0
alpha = 0.001
k_min = int(stats.skellam.ppf(alpha, mu1, mu2))
k_max = int(stats.skellam.ppf(1 - alpha, mu1, mu2))
ks = np.arange(k_min, k_max + 1)
pmf = stats.skellam.pmf(ks, mu1, mu2)
cdf = stats.skellam.cdf(ks, mu1, mu2)
fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(
title=f"Skellam PMF (mu1={mu1}, mu2={mu2})",
xaxis_title="k",
yaxis_title="P(X=k)",
)
fig_pmf.show()
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(
title=f"Skellam CDF (mu1={mu1}, mu2={mu2})",
xaxis_title="k",
yaxis_title="P(X≤k)",
)
fig_cdf.show()
mc = sample_skellam_numpy(mu1, mu2, size=200_000, rng=rng)
mask = (mc >= k_min) & (mc <= k_max)
counts = np.bincount(mc[mask] - k_min, minlength=len(ks))
pmf_hat = counts / mc.size
fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=pmf_hat, name="Monte Carlo", opacity=0.6))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="Exact PMF"))
fig_mc.update_layout(
title=f"Monte Carlo vs exact PMF (mu1={mu1}, mu2={mu2})",
xaxis_title="k",
yaxis_title="Probability",
)
fig_mc.show()
{
"prob_mass_in_plot_window": float(pmf.sum()),
"mc_mass_in_plot_window": float(((mc >= k_min) & (mc <= k_max)).mean()),
}
{'prob_mass_in_plot_window': 0.9984984715902749,
'mc_mass_in_plot_window': 0.998655}
9) SciPy Integration#
SciPy provides a numerically robust implementation via scipy.stats.skellam.
Use
pmf,cdf,sf,rvs,logpmf,logcdf, …Many
rv_discreteobjects (includingskellam) do not expose a.fit()method. In SciPy 1.15+, you can use the genericscipy.stats.fitfor MLE with bounds.skellamalso has alocparameter (integer shift). In most modeling contexts you wantloc=0.
skellam = stats.skellam
mu1, mu2 = 5.0, 3.0
alpha = 0.001
k_min = int(skellam.ppf(alpha, mu1, mu2))
k_max = int(skellam.ppf(1 - alpha, mu1, mu2))
ks = np.arange(k_min, k_max + 1)
pmf = skellam.pmf(ks, mu1, mu2)
cdf = skellam.cdf(ks, mu1, mu2)
samples = skellam.rvs(mu1, mu2, size=10_000, random_state=rng)
mom = skellam_moments(mu1, mu2)
# Fit with scipy.stats.fit (MLE)
data = skellam.rvs(7.0, 4.0, size=2_000, random_state=rng)
fit_res = stats.fit(
skellam,
data,
bounds={"mu1": (0.0, 50.0), "mu2": (0.0, 50.0), "loc": (0.0, 0.0)},
)
{
"pmf_sum_on_grid": float(pmf.sum()),
"cdf_last_on_grid": float(cdf[-1]),
"sample_mean": float(samples.mean()),
"theory_mean": mom["mean"],
"sample_var": float(samples.var(ddof=0)),
"theory_var": mom["var"],
"fit_success": bool(fit_res.success),
"fit_mu1": float(fit_res.params.mu1),
"fit_mu2": float(fit_res.params.mu2),
"fit_loc": float(fit_res.params.loc),
}
{'pmf_sum_on_grid': 0.998847831553688,
'cdf_last_on_grid': 0.9992059941388924,
'sample_mean': 1.999,
'theory_mean': 2.0,
'sample_var': 7.929998999999999,
'theory_var': 8.0,
'fit_success': True,
'fit_mu1': 7.284579179865933,
'fit_mu2': 4.275579077706104,
'fit_loc': 0.0}
10) Statistical Use Cases#
A) Hypothesis testing (difference of Poisson means)#
If you have a model for two independent Poisson counts, Skellam gives the exact distribution of the difference. This can be used for exact p-values on a score differential or “net count”.
B) Bayesian modeling (two Poisson rates)#
In many applications you model (Y_1) and (Y_2) directly with Poisson likelihoods and put Gamma priors on (\mu_1, \mu_2). The induced predictive distribution for (X=Y_1-Y_2) is then a (mixture of) Skellam.
C) Generative modeling (integer-valued innovations)#
Skellam is a convenient choice when you need integer noise that can be negative, e.g. in state-space models or random walks with drift.
# Example: test whether there is an advantage (H0: mu1 = mu2)
mu0 = 1.4
d_obs = 3
# Two-sided p-value under a symmetric Skellam(mu0, mu0)
p_upper = stats.skellam.sf(d_obs - 1, mu0, mu0) # P(X >= d_obs)
p_lower = stats.skellam.cdf(-d_obs, mu0, mu0) # P(X <= -d_obs)
p_two = p_upper + p_lower
# Conditional exact test given N = Y1 + Y2 (does not depend on mu0 under H0)
y1, y2 = 4, 1
n = y1 + y2
binom_res = stats.binomtest(y1, n=n, p=0.5, alternative="two-sided")
{
"skellam_two_sided_p": float(p_two),
"conditional_binomtest_p": float(binom_res.pvalue),
"note": "Under H0 mu1=mu2, conditioning on N=Y1+Y2 gives a Binomial test.",
}
{'skellam_two_sided_p': 0.12687630672173225,
'conditional_binomtest_p': 0.375,
'note': 'Under H0 mu1=mu2, conditioning on N=Y1+Y2 gives a Binomial test.'}
# Simple Bayesian example (Gamma–Poisson) + posterior predictive for the score difference
y1_obs = np.array([2, 1, 0, 3, 1, 2])
y2_obs = np.array([1, 0, 1, 1, 2, 1])
n_games = len(y1_obs)
# Prior: mu ~ Gamma(alpha, beta) with beta = rate (so scale = 1/beta)
alpha1, beta1 = 1.0, 1.0
alpha2, beta2 = 1.0, 1.0
post_alpha1 = alpha1 + y1_obs.sum()
post_beta1 = beta1 + n_games
post_alpha2 = alpha2 + y2_obs.sum()
post_beta2 = beta2 + n_games
m = 50_000
mu1_s = rng.gamma(shape=post_alpha1, scale=1.0 / post_beta1, size=m)
mu2_s = rng.gamma(shape=post_alpha2, scale=1.0 / post_beta2, size=m)
# Posterior predictive for next-game difference (mixture of Skellam)
y1_pred = rng.poisson(mu1_s)
y2_pred = rng.poisson(mu2_s)
d_pred = y1_pred - y2_pred
k_lo = int(np.percentile(d_pred, 0.5))
k_hi = int(np.percentile(d_pred, 99.5))
ks = np.arange(k_lo, k_hi + 1)
mask = (d_pred >= k_lo) & (d_pred <= k_hi)
counts = np.bincount(d_pred[mask] - k_lo, minlength=len(ks))
pmf_hat = counts / d_pred.size
fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf_hat, name="posterior predictive"))
fig.update_layout(
title="Posterior predictive distribution of difference (Gamma–Poisson model)",
xaxis_title="k",
yaxis_title="Probability",
)
fig.show()
{
"posterior_mean_mu1": float(mu1_s.mean()),
"posterior_mean_mu2": float(mu2_s.mean()),
"predictive_mean_diff": float(d_pred.mean()),
"predictive_var_diff": float(d_pred.var(ddof=0)),
"p(diff>0)": float((d_pred > 0).mean()),
}
{'posterior_mean_mu1': 1.4275728197733406,
'posterior_mean_mu2': 0.9975202827806741,
'predictive_mean_diff': 0.43624,
'predictive_var_diff': 2.7678146623999997,
'p(diff>0)': 0.46646}
# Generative modeling: Skellam innovations in an integer-valued random walk
T = 200
mu_plus, mu_minus = 2.0, 1.5
increments = sample_skellam_numpy(mu_plus, mu_minus, size=T, rng=rng)
path = np.cumsum(increments)
fig = go.Figure()
fig.add_trace(go.Scatter(y=path, mode="lines", name="x_t"))
fig.update_layout(
title="Random walk with Skellam innovations",
xaxis_title="t",
yaxis_title="x_t",
)
fig.show()
{
"drift_theory": mu_plus - mu_minus,
"drift_empirical": float(increments.mean()),
}
{'drift_theory': 0.5, 'drift_empirical': 0.415}
11) Pitfalls#
Invalid parameters: (\mu_1, \mu_2) must be (\ge 0). The boundary case (\mu_1=\mu_2=0) is degenerate (all mass at 0).
Model misspecification: Skellam assumes independent Poisson counts. Correlation or over/under-dispersion can invalidate p-values and CIs. If overdispersion is present, consider modeling (Y_1) and (Y_2) with Negative Binomial (Gamma–Poisson mixture) or adding random effects.
Unequal exposure: if the two counts come from different time windows or populations, interpret (\mu_1) and (\mu_2) as expected counts for those exposures.
Numerical issues: the PMF involves a modified Bessel function (I_{|k|}(\cdot)), which grows roughly like (e^{2\sqrt{\mu_1\mu_2}}). Naive evaluation can overflow; prefer
logpmfor scaled Bessel forms (as inspecial.ive).
mu1, mu2 = 200.0, 190.0
k = 0
z = 2.0 * math.sqrt(mu1 * mu2)
iv_val = special.iv(abs(k), z)
ive_val = special.ive(abs(k), z)
# Naive PMF computation can produce inf/0 -> nan for large parameters
pmf_naive = math.exp(-(mu1 + mu2)) * (mu1 / mu2) ** (k / 2) * iv_val
logpmf_stable = skellam_logpmf(k, mu1, mu2)
logpmf_scipy = stats.skellam.logpmf(k, mu1, mu2)
{
"z": z,
"iv(abs(k), z)": float(iv_val),
"ive(abs(k), z)": float(ive_val),
"pmf_naive": float(pmf_naive) if np.isfinite(pmf_naive) else str(pmf_naive),
"logpmf_stable": float(logpmf_stable),
"logpmf_scipy": float(logpmf_scipy),
}
k_far = 600
{
"k_far": k_far,
"pmf": float(stats.skellam.pmf(k_far, mu1, mu2)),
"logpmf": float(stats.skellam.logpmf(k_far, mu1, mu2)),
}
{'k_far': 600, 'pmf': 2.127058179463747e-171, 'logpmf': -392.98731101331043}
12) Summary#
Skellam models the difference of two independent Poisson counts: (X=Y_1-Y_2).
Support is all integers; parameters (\mu_1,\mu_2\ge 0).
Key moments: (\mathbb{E}[X]=\mu_1-\mu_2), (\mathrm{Var}(X)=\mu_1+\mu_2), skewness ((\mu_1-\mu_2)/(\mu_1+\mu_2)^{3/2}).
PMF involves a modified Bessel function; use
logpmf/ scaled Bessel forms for numerical stability.Sampling is easy: draw two Poisson counts and subtract.
Common uses include score differentials, net event counts, and integer-valued innovations in generative models.